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A  TALE  OF  TWO  TEST  BATTERIES: 

A  COMPARISON  OF  THE  AIR  FORCE  OFFICER  QUALIFYING  TEST 
AND  THE  MULTIDIMENSIONAL  APTITUDE  BATTERY 


SUMMARY 

The  Air  Force  Officer  Qualifying  Test  (AFOQT)  and  Multidimensional  Aptitude  Battery 
(MAB)  were  administered  to  2,233  US  Air  Force  pilot  candidates  to  investigate  the  common 
somces  of  variance  in  those  batteries.  The  AFOQT  was  operationally  administered  as  part  of  the 
officer  commissioning  and  aircrew  selection  testing  requirement.  The  MAB  is  a  clinical  test 
battery  and  was  administered  to  provide  an  intellectual  baseline  to  assist  clinicians  when  it 
becomes  necessary  to  evaluate  pilots  with  cognitive  referral  questions.  A  joint  factor  analysis  of 
the  AFOQT  and  MAB  revealed  that  each  battery  had  an  hierarchical  structure.  The  higher-order 
factor  in  the  AFOQT  previously  had  been  identified  as  general  cognitive  ability  (g).  The 
intercorrelation  between  the  higher-order  factors  from  the  batteries  was  .981,  indicating  that  both 
measured  g.  Although  bqth  batteries  measured  g  and  included  verbal,  spatial,  and  perceptual 
speed  tests,  the  AFOQT  also  included  tests  of  aviation  knowledge  not  found  in  the  MAB. 
Additional  studies  are  required  to  evaluate  the  utility  of  the  AFOQT  for  clinical  assessment  and 
the  MAB  for  officer  and  aircrew  selection. 

INTRODUCTION 

The  Air  Force  Officer  Qualifying  Test  (AFOQT)  is  used  to  qualify  civilians  and  prior- 
enlisted  US  Air  Force  (USAF)  personnel  for  officer  commissioning  through  the  Officer  Training 
School  and  Reserve  Officer  Training  Corps  programs.  It  is  also  used  to  qualify  applicants  who 
pass  other  educational  and  physical  requirements  for  aircrew  training.  The  AFOQT  has  been 
validated  for  pilot  and  navigator  training  (Arth,  Steuck,  Sorrentino,  &  Bmke,  1990;  Carretta, 
1992;  Carretta  &  Ree,  1995;  Koonce,  1982;  01ea&  Ree,  1994;  Ree  &  Carretta,  1996;  Ree, 
Carretta,  &  Teachout,  1995)  and  for  several  other  officer  jobs  (Arth,  1986;  Arth  &  Skinner, 

1986;  Finegold  &  Rogers,  1985). 

In  1994,  the  Air  Force  Medical  Operations  Agency  began  a  program  to  establish  a 
psychological  testing  baseline  for  Air  Force  pilots.  This  baseline  was  intended  to  assist  clinicians 
when  evaluating  pilots  with  cognitive  referral  questions  (Callister,  King,  &  Retzlaff,  1996; 
Retzlaff,  Callister,  &  King,  1996).  One  of  the  tests  used  to  establish  this  baseline  is  the 
Multidimensional  Aptitude  Battery  (MAB)  (Jackson,  1985).  The  MAB  is  normally  administered 
in  paper-and-pencil  form.  The  USAF  developed  a  computerized  version  which  was  administered 
to  pilot  candidates  during  a  flight  screening  program  (King  &  Flynn,  1995). 

The  purpose  of  this  study  was  to  determine  the  extent  to  which  the  AFOQT  and  MAB 
measure  the  same  constructs.  If  there  is  considerable  overlap  between  the  two  batteries,  further 
research  may  be  directed  toward  using  the  AFOQT  for  clinical  assessment  and  the  MAB  for 
officer  and  aircrew  selection. 
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METHOD 


Participants 

Participants  were  2,233  US  Air  Force  pilot  candidates  who  completed  the  AFOQT  and  a 
computerized  version  of  the  MAB.  The  sample  had  a  mean  age  of  20.6  years  and  was 
predominantly  male  (92%)  and  White  (87%). 

Measures 

Air  Force  Officer  Qualifying  Test.  The  AFOQT  is  a  paper-and-pencil  multiple  aptitude 
battery  used  for  officer  commissioning  and  aircrew  training  selection  (Skinner  &  Ree,  1987).  It  is 
developed  and  maintained  by  the  USAF.  Administration  time  is  about  4  hours.  The  16  AFOQT 
tests  are  combined  to  create  five  operational  composites:  Verbal,  Quantitative,  Academic 
Aptitude,  Pilot,  and  Navigator-Technical.  It  has  an  hierarchical  factor  structure  and  measures 
general  cognitive  ability  {g)  and  the  lower-order  factors  of  verbal,  math,  spatial,  aircrew 
interest/aptitude,  and  perceptual  speed  (Carretta  &  Ree,  1996). 

Multidimensional  Aptitude  Battery.  The  MAB  is  a  broad-based  test  of  intellectual  ability. 
It  was  patterned  after  the  Wechsler  Adult  Intelligence  Scale  (WAIS-R;  full-scale  r  =  .91). 
Although  the  MAB  requires  about  the  same  amount  of  time  to  administer  as  the  WAIS-R  (about 
1.5  hours),  it  can  be  group-administered  and  machine  scored,  while  the  WAIS-R  cannot. 

The  paper-and-pencil  version  of  the  MAB  was  developed  by  Jackson  (1985)  and  the 
computerized  version  by  the  USAF  Armstrong  Laboratory  (Retzlaff,  King,  &  Callister,  1995). 
The  computerized  version  was  developed  and  used  with  the  consent  of  the  test  author  with 
explicit  copyright  permission.  The  two  versions  have  the  same  10  tests  with  identical  items.  The 
tests  are  Information,  Comprehension,  Arithmetic,  Similarities,  Vocabulary,  Digit  Symbol, 
Picture  Completion,  Spatial,  Picture  Arrangement,  and  Object  Assembly.  These  tests  are 
combined  to  form  three  composites:  Full  Scale  (all  10  tests).  Verbal  (first  five  tests),  and 
Performance  (last  five  tests). 

The  MAB  was  administered  on  a  386-based  computer  with  a  14-inch  color  monitor. 
Participants  entered  their  responses  using  a  keypad  and  mouse  or  light  pen. 

Procedures 

The  AFOQT  was  completed  as  a  requirement  of  application  for  officer  commissioning 
and/or  aircrew  selection.  The  time  frame  for  AFOQT-testing  varied.  Some  took  the  AFOQT 
near  the  completion  of  high  school  or  while  in  college.  Others  took  it  after  completing  college. 
All  participants  completed  the  MAB  shortly  before  beginning  the  Enhanced  Flight  Screening 
Program.  MAB  testing  was  done  to  establish  an  ideographic  cognitive  baseline  for  the  clinical 
evaluation  of  pilots  for  comparative  purposes  after  sustaining  a  head  injury  or  other  neurological 
insult. 
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Analyses 


The  participants  represented  a  range-restricted  sample  because  they  had  already  been 
selected  for  college  and  for  an  officer  commissioning  program  based  on  AFOQT  and/or  college 
entrance  exams.  The  Lawley  correction  procedure  (Lawley,  1943;  Ree,  Carretta,  Earles,  & 
Albert,  1994)  was  applied  to  estimate  the  means,  variances,  and  correlations  of  the  tests  as  they 
would  be  found  in  USAF  officer  applicants  (Skinner  &  Ree,  1987).  The  confirmatory  factor 
analyses  were  conducted  using  the  range-restriction-corrected  data  as  it  provided  a  superior 
estimate  of  the  means,  standard  deviations,  and  correlations. 

Hierarchical  confirmatory  factor  analyses  (HCFAs)  were  perfomied  using  LISREL  8 

(Jdreskog  &  Sdrbom,  1996).  The  first-order  confirmatory  factor  analysis  (CFA)  allowed  all 
observed  variables  (16  AFOQT  and  10  MAB  tests)  to  load  on  their  first-order  factors  and  those 
first-order  factors  to  correlate  -with  each  other.  The  first-order  factors  included  the  five  lower- 
order  AFOQT  factors  of  verbal,  math  spatial,  aircrew  interest/aptitude,  and  perceptual  speed  and 
two  MAB  factors  representing  the  MAB  Verbal  (first  five  tests)  and  Performance  (last  five  tests) 
composites.  A  higher-order  CFA  was  then  conducted  using  the  first-order  factor  intercorrelation 
matrix.  This  higher-order  CFA  allowed  the  five  AFOQT  factors  to  load  on  a  higher-order  general 
factor  (gAFOQi)  3nd  the  two  MAB  factors  to  load  on  a  second  higher-order  general  factor  (^mab)- 
These  two  general  factors  were  allowed  to  correlate  and  between-battery  relationships  among  the 
lower-order  factors  were  examined.  Generalized  least  squares  estimation  procedures  were  used. 

Although  it  may  appear  that  the  higher-order  factor  is  underdefined  with  only  two 
indicators,  Costner  (1969)  discusses  the  circumstances  under  which  two  indicators  are  sufficient. 
Generally,  it  is  not  required  that  all  correlations  between  different  pairs  of  indicators  be  identical. 
Rather,  it  is  required  that  several  estimates  of  a  single  abstract  coefficient  (e.g.,  factor  loading)  be 
consistent. 

Several  fit  indices  were  computed.  These  included  the  Comparative  Fit  Index  (CFI) 
(Rentier,  1990),  Non-Normed  Fit  Index  (NNFI)  (Marsh,  Balia,  &  McDonald,  1988),  and  Root 
Mean  Square  Error  of  Approximation  (RMSEA)  (Browne  &  Cudeck,  1993). 

RESUf^TS  AND  DISCUSSION 

Table  1  shows  the  means  and  standard  deviations  of  the  tests  in  observed  and  corrected- 
for-range-restriction  form.  The  observed  AFOQT  means  were  on  average  about  .90  standard 
deviations  above  the  normative  values  and  the  variances  were  about  77  %  of  the  normative 
values  for  USAF  officer  applicants  (Skinner  &  Ree,  1987).  The  observed  means  for  the  MAB 
tests  were  about  1  standard  deviation  above  the  normative  value  of  50  and  the  variances  were 
about  54%  of  the  normative  value  of  100  for  adults  (Jackson,  1985).  After  correction  for  range 
restriction  (to  USAF  officer  applicant  norms),  the  MAB  tests  were  still  about  .62  standard 
deviations  above  their  normative  value  and  the  variances  were  about  69%  of  the  adult  normative 
value  of  100.  This  suggests  that  USAF  officer  applicants  are  above  adult  norms  on  the  construct 
measured  by  the  MAB  (i.e.,  intellectual  ability). 
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Table  1. 

Means  and  Standard  Deviations  for  AFOOT  and  MAB  Scores 


Score 

Abbr. 

Observed 

Mean 

1 

SD 

Corrected 

Mean  SD 

AFOQT 

Verbal  Analogies 

VA 

18.29 

3.31 

13.36 

4.23 

Arithmetic  Reasoning 

AR  \ 

18.43 

4.57 

11.00 

4.40 

Reading  Comprehension  RC 

17.93 

4.34 

15.83 

5.93 

Data  Interpretation 

DI 

18.81 

3.83 

11.15 

3.93 

Word  Knowledge 

WK 

16.86 

4.84 

13.28 

5.83 

Math  Knowledge 

MK 

19.87 

4.39 

14.48 

6.04 

Mechanical  Comp. 

MC 

11.60 

3.72 

9.78 

3.65 

Electrical  Maze 

EM 

8.89 

3.31 

7.68 

4.22 

Scale  Reading 

SR 

27.93 

5.88 

20.07 

6.73 

Instrument  Comp. 

IC 

15.08 

4.13 

8.82 

4.76 

Block  Counting 

BC 

14.22 

3.44 

10.62 

4.39 

Table  Reading 

TR 

30.69 

5.96 

26.46 

7.35 

Aviation  Information 

AI 

13.31 

4.24 

8.65 

4.08 

Rotated  Blocks 

RB 

9.94 

2.76 

7.59 

3.36 

General  Science 

GS 

11.43 

3.52 

8.54 

3.66 

Hidden  Figures 

HF 

10.89 

2.75 

9.60 

2.76 

MAB 

Information 

INF 

66.80 

6.89 

64.36 

7.18 

Comprehension 

COM 

59.74 

4.36 

58.17 

4.60 

Arithmetic 

ARI 

60.89 

6.23 

54.72 

6.60 

Similarities 

SIM 

59.82 

8.66 

56.14 

9.15 

Vocabulary 

VOC 

60.29 

9.33 

58.15 

10.02 

Digit  Symbol 

DIG 

63.10 

6.98 

58.15 

7.81 

Picture  Completion 

PC 

59.47 

6.43 

56.44 

6.79 

Spatial 

SPA 

59.10 

8.94 

54.04 

9.68 

Picture  Arrangement 

PA 

51.95 

7.01 

48.33 

7.45 

Object  Assembly 

OBJ 

58.94 

7.58 

53.68 

8.31 

Note.  Means  and  standard  deviations  were  corrected  for  range  restriction  using  the  multivariate  Lawley  (1943) 
procediue.  An  AFOQT  officer  applicant  sample  was  used  (Skinner  &  Ree,  1987). 


The  correlations  among  the  tests  are  shown  in  Table  2.  The  observed  correlations  (above  the 
diagonal)  were  positive  with  two  exceptions  involving  the  AFOQT  Aviation  Information  test  and 
two  MAB  tests  (AI  and  DIG  =  -.010;  AI  and  SPA  =  -.007).  The  largest  observed  correlation  was 
between  two  AFOQT  math  tests,  AR  and  DI  (.636). 
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Table  2. 

Correlation  Matrix  for  AFOOT  and  MAB  Scores 
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Note.  Decimals  were  omitted  to  conserve  space.  Correlations  above  the  diagonal  were  observed.  Correlations  below  the  diagonal  were  corrected 
for  range  restriction.  Lawley’s(1943)  multivariate  correction  was  applied  to  the  tests.  An  AFOQT  officer  applicant  sample  was  used  as  a 
reference  group  (Skinner  &  Ree,  1987). 


All  correlations  were  positive  after  correction  for  range  restriction  (below  the  diagonal). 
See  Ree  et  al.  (1994)  for  an  explanation  of  change  in  correlation  sign  after  correction  for  range 
restriction.  The  largest  correlation  after  correction  for  range  restriction  was  between  two  AFOQT 
verbal  tests,  RC  and  WK  (.770)  and  the  smallest  correlation  (.071)  was  between  a  spatial  test 
from  the  AFOQT  (EM)  and  a  verbal  test  from  the  MAB  (VOC). 

The  correlations  among  the  26  tests  were  used  to  estimate  a  seven-factor,  first-order  CFA 
(5  lower-order  AFOQT  factors  and  2  lower-order  MAB  factors).  The  (275)  was  2,032.791, 
CFI  was  .974,  the  NNFI  was  .970,  and  the  RMSEA  was  .053.  This  is  evidence  of  a  good  fit.  The 
factor  loadings  for  this  lower-order  model  are  shown  in  Table  Al .  The  resulting  correlation 
matrix  for  the  lower-order  factors  (Table  3)  was  used  to  estimate  the  hierarchical  model. 

Table  3  shows  the  correlations  among  the  first-order  factors.  They  ranged  from  .450 
(aviation  and  MAB  verbal)  to  .895  (AFOQT  verbal  and  math)  with  a  mean  value  of  .727.  An 
examination  of  the  between-battery  correlations  showed  the  AFOQT  verbal  and  math  factors  to 
have  higher  correlations  with  the  MAB  verbal  factor,  while  the  AFOQT  spatial,  aviation,  and 
perceptual  speed  factors  had  higher  correlations  with  the  MAB  performance  factor.  The  MAB 
verbal  factor  showed  its  highest  between-battery  correlation  with  the  AFOQT  verbal  factor  (.893) 
and  its  lowest  correlation  with  aviation  (.450).  The  MAB  performance  factor  had  its  highest 
between-battery  correlation  with  spatial  (.854)  and  its  lowest  correlation  with  aviation  (.587). 

The  correlation  between  the  two  MAB  factors  was  .787. 


Table  3. 

First-Order  Factor  Intercorrelations 


Factor® 

Verbal 

Math 

Percep. 

Spatial  Aviation  Speed 

MAB 

Verbal 

MAB 

Performance 

Verbal 

1.000  . 

Math 

0.895 

1.000 

Spatial 

0.781 

0.825 

1.000 

Aviation 

0.560 

0.652 

0.808 

1.000 

Perceptual  Speed 

0.651 

0.719 

0.834 

0.677 

1.000 

MAB  Verbal 

0.893 

0.858 

0.719 

0.450 

0.530 

1.000 

MAB  Performance 

0.768 

0.754 

0.854 

0.587 

0.683 

0.787 

1.000 

®The  first  five  factors  were  from  the  AFOQT  and  the  last  two  factors  were  from  the  MAB. 
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The  hierarchical  model  is  shown  in  Figure  1.  The  loadings  of  the  lower-order  factors  on 
their  respective  higher-order  factors  were  high,  ranging  from  .775  to  .976.  This  indicated  that  the 
lower-order  factors  were  essentially  measures  of  their  respective  higher-order  factors.  The  strong 
correlation  between  the  two  higher-order  factors  (.981)  indicated  that  they  measured  the  same 
higher-order  factor.  Because  of  the  strength  of  this  correlation  and  because  the  higher-order 
AFOQT  factor  is  known  to  be  psychometric  g,  it  is  apparent  that  the  higher-order  factor  in  the 
MAB  also  is  g.  General  cognitive  ability  accounted  for  more  variance  than  the  sum  of  the  lower- 
order  factors  for  both  batteries.  The  proportion  of  common  variance  accounted  for  by  g  was 
similar  for  the  two  batteries:  67.2%  for  the  AFOQT  (Carretta  &  Ree,  1996)  and  67.7%  for  the 
MAB. 


Figure  1.  Hierarchical  Model. 

Note.  The  higher-order  factors  were  gAroQT  and  gwAs,  respectively.  The  lower-order  AFOQT  factors  were  Verbal, 
Math,  Spatial,  Aviation  Interest/Aptitude,  and  Perceptual  Speed.  The  lower-order  MAB  factors  were  MAB  Verbal 
and  MAB  Performance. 


Similar  results  were  reported  by  Sperl,  Ree,  and  Steuck  (1992)  and  by  Stauffer,  Ree,  and 
Carretta  (1996).  Sperl  et  al.  examined  the  relationship  between  the  verbal  and  math  tests  from 
the  AFOQT  and  Armed  Services  Vocational  Aptitude  Battery  (ASVAB).  They  found  a  first 
canonical  correlation  between  the  two  batteries  of  .93  indicating  a  high  level  of  common 
variance.  Stauffer  et  al.  examined  the  common  sources  of  variance  between  all  10  ASVAB  tests 
and  a  set  of  computer-based  cognitive  components  tests.  As  in  the  current  study,  Stauffer  et  al. 
foimd  a  strong  correlation  (.994)  between  the  higher-order  factors  from  the  two  batteries 
indicating  both  higher-order  factors  measured  the  same  construct. 
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These  results  suggest  that  both  the  AFOQT  and  MAB  may  be  acceptable  for  establishing 
a  clinical  cognitive  baseline  for  USAF  pilot  trainees.  Both  batteries  measure  psychometric  g  as 
well  as  verbal,  spatial,  and  perceptual  speed  (the  later  two  factors  are  subsumed  in  the  MAB 
performance  factor).  However,  it  is  not  clear  that  the  two  batteries  identically  measure  the  lower- 
order  factors. 

The  chief  advantage  of  the  MAB  over  the  AFOQT  for  use  as  a  clinical  assessment  tool  is 
its  similarity  to  standard  clinical  intelligence  tests  such  as  the  WAIS-R.  Air  Force  clinical 
psychologists  routinely  use  the  WAIS-R  to  evaluate  pilots  referred  for  cognitive  assessment. 
Because  of  its  similarity  to  the  WAIS-R,  clinicians  find  it  relatively  easy  to  make  pre-  and  post¬ 
incident  comparisons  using  baseline  MAB  data.  If  the  AFOQT  were  to  be  used  instead  of  the 
MAB  for  making  pre-  and  post-incident  comparisons,  clinicians  would  need  training  to  become 
more  familiar  with  the  AFOQT  and  its  relation  to  the  WAIS-R  or  MAB. 

Although  the  AFOQT  takes  longer  to  administer  than  the  MAB  (4  hours  vs.  1.5  hours),  it 
is  already  in  operational  use  for  officer  commissioning  and  aircrew  selection  so  would  not 
require  any  special  administration  as  does  the  MAB.  Further,  the  AFOQT  includes  tests  of 
aviation  interest/aptitude  not  covered  by  the  MAB  (i.e.,  Instrument  Comprehension  and  Aviation 
Information).  These  tests  have  been  shown  to  be  useful  for  predicting  pilot  performance  beyond 
measures  of  g  and  specific  cognitive  abilities  such  as  verbal,  math,  spatial,  and  perceptual  speed 
(Olea  &  Ree,  1994;  Ree  &  Carretta,  1996;  Ree,  Carretta,  &  Teachout,  1995).  Therefore,  if  the 
MAB  were  to  be  used  in  place  of  the  AFOQT,  it  woidd  be  desirable  to  retain  at  least  the  aviation 
interest^aptitude  portions  of  the  AFOQT  to  ensure  no  loss  of  validity  for  predicting  pilot  training 
performance. 

Additional  studies  are  planned  to  evaluate  the  utility  of  the  AFOQT  for  clinical 
assessment  and  the  utility  of  the  MAB  for  officer  and  aircrew  selection.  If  the  two  batteries  are 
interchangeable,  the  Air  Force  may  be  able  to  save  administration  time  by  using  one  test  for  both 
purposes. 
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Table  Al. 

Factor  Loadings  for  the  Seven-Factor  Lower-Order  Model 


Score 

Verbal 

Math 

Factor 

Percep. 

Spatial  Aviation  Speed 

MAB 

Verbal 

MAB 

Performance 

VA 

0.838 

AR 

0.845 

RC 

0.896 

DI 

0.767 

WK 

0.864 

MK 

0.795 

MC 

0.781 

EM 

0.547 

SR 

0.386 

0.471 

IC 

0.794 

BC 

0.454 

0.321 

TR 

0.666 

AI 

0.756 

RB 

0.702 

GS 

0.515 

0.322 

HF 

0.570 

INF 

0.524 

COM 

0.596 

ARI 

0.662 

SIM 

0.597 

VOC 

0.649 

DIG 

0.648 

PC 

0.652 

SPA 

0.597 

PA 

0.580 

OBJ 

0.715 
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